Adult Content Filtering through Compression-Based Text Classification

نویسندگان

  • Igor Santos
  • Patxi Galán-García
  • Aitor Santamaría-Ibirika
  • Borja Alonso-Isla
  • Iker Alabau-Sarasola
  • Pablo García Bringas
چکیده

Internet is a powerful source of information. However, some of the information that is available in the Internet, cannot be shown to every type of public. For instance, pornography is not desirable to be shown to children. To this end, several algorithms for text filtering have been proposed that employ a Vector Space Model representation of the webpages. Nevertheless, these type of filters can be surpassed using different attacks. In this paper, we present the first adult content filtering tool that employs compression algorithms to represent data that is resilient to these attacks. We show that this approach enhances the results of classic VSM models.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Combining Text And Image Analysis in The Web Filtering System "WEBGUARD"

Web applications increasingly utilize search techniques that heavily rely on content-based text and image analyses. For example, for parental site filtering, it is necessary to identify adult sites. These applications must rely on a semantic analysis of images in the process of identification where text analysis alone is insufficient. In this article, we describe our site filtering system "WebG...

متن کامل

Filtering of Undesired Messages from Osn User Space

One fundamental issue in today On-line Social Networks (OSNs) is to give users the ability to control the messages posted on their own private space to avoid that unwanted content is displayed. Up to now OSNs provide little support to this requirement. To fill the gap, in this paper, we propose a system allowing OSN users to have a direct control on the messages posted on their walls. This is a...

متن کامل

Spam Filtering Using Compression Models

Spam filtering poses a special problem in text categorization, of which the defining characteristic is that filters face an active adversary, which constantly attempts to evade filtering. Since spam evolves continuously and most practical applications are based on online user feedback, the task calls for fast, incremental and robust learning algorithms. This paper summarizes our experiments for...

متن کامل

Spam Filtering Using Statistical Data Compression Models

Spam filtering poses a special problem in text categorization, of which the defining characteristic is that filters face an active adversary, which constantly attempts to evade filtering. Since spam evolves continuously and most practical applications are based on online user feedback, the task calls for fast, incremental and robust learning algorithms. In this paper, we investigate a novel app...

متن کامل

A Text Based Filtering System for OSN User Walls

As we know, today everyone is using On-line Social Networks (OSNs) to communicate and share information. Therefore one important need in today On-line Social Networks (OSNs) is to give users the ability to control the messages posted on their own private space to avoid that unwanted content is displayed. OSNs provide little support to this requirement up to now. To provide this, we propose a sy...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012